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The need for large-scale electronic structure calculations arises recently in the field of material 
physics and efficient and accurate algebraic methods for large simultaneous linear equations become 
greatly important. We investigate the generalized shifted conjugate orthogonal conjugate gradient 
method, the generalized Lanczos method and the generalized Arnoldi method. They are the solver 
methods of large simultaneous linear equations of one-electron Schrodinger equation and maps the 
whole Hilbert space to a small subspace called the Krylov subspace. These methods are applied 
to systems of fee An with the NRL tight-binding Hamiltonian (Phys. Rev. B 63, 195101 (2001)). 
We compare results by these methods and the exact calculation and show them equally accurate. 
The system size dependence of the CPU time is also discussed. The generalized Lanczos method 
and the generalized Arnoldi method are the most suitable for the large-scale molecular dynamics 
simulations from the view point of CPU time and memory size. 

PACS numbers: 71.15.-m, 02.70.Ns, 71.15.Pd 



I. INTRODUCTION 



solving simultaneous linear equations 



In recent years, molecular dynamics (MD) simulations 
with electronic structure calculations in nano-scale struc- 
tures have attracted much attention. One needs a large 
size of systems of several hundred thousands atoms with 
a few hundred pico-seconds (or more longer time) process 
in order to investigate characteristics of nano-scale sys- 
tems such as phenomena of competition between different 
physical principles or phenomena of the multi-physics, 
e.g. energy competition between the strain field and 
chemical bondsi^"— Several requirements for large-scale 
MD simulation with electronic structure calculations are 
contradictory to each other, e.g. total energy accuracy 
vs. larger system size or longer physical time of processes. 

There are several approaches for large-scale MD sim- 
ulations)^ (a) the Fermi operator expansion^S (b) the 
divide-and-conquer method^ and (c) the minimization 
method (the density matrix minimization^ or the wave- 
function minimization^). Another classification may be 
the one according as the basis set of wavefunctions; (a) 
the plane wave basis set and switching between the real- 
space and fc-space representatioufiS and (b) localized or- 
bitalsii or tight-binding basis setJ^ Computation with 
"massively parallel machine" is also an important issue. 

An important aspect is development of novel algebraic 
algorithm for extra-large scale systems. The most gen- 
eral and important algorithm may be the linear algebra 



(z — H)x = b, 



(1) 



where H is self-adjoint or real symmetric matrix, b is 
a given vector, z = e -I- iry, e is an energy parameter 
and r/ is an infinitesimally small positive number, respec- 
tively. Solutions of Eq.(IT|) relate to the standard eigen- 
value problem (e — H)x — 0. We developed the subspace 
diagonalization method and the shifted conjugate orthog- 
onal conjugate gradient (COCG) method. "'^-liS, Then the 
methods were applied to the fracture propagation and 
surface formation in Si crystals with the tight-binding 
Hamiltonian based on an orthogonal basis setiii^ On the 
other hand, since its Hamiltonian is described by the 
tight-binding Hamiltonian based on a non-orthogonal ba- 
sis set, the problem of the formation of Au multishell he- 
lical nanowires was solved by the exact diagonalization 
methodj^i^ 

Development of efficient linear algebraic methods has 
been, so far, mainly based on the orthogonal basis 
setSi ^^'^^i^^" — However, localized basis wavefunctions are 
generally non-orthogonal and it is much desirable to gen- 
eralize the methods to the case of a non-orthogonal basis 
set. The most popular strategy of the generalized eigen- 
value problem (represented by the non-orthogonal basis 
set) would be the transformation to the standard eigen- 
value problemJ^ Our target in the present paper is to 
solve simultaneous linear equations with self-adjoint or 
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real symmetric matrix S; 

{zS-H}x = b, (2) 

which relates to the generalized eigen-value problem 
{eS — H)x = 0. We will investigate efficient methods 
of solving Eq. ([2]) with a complex energy variable z when 
the matrix size of H and S is huge. Several algebraic al- 
gorithms will be discussed and directly applied to a tight- 
binding Hamiltonian based on non-orthogonal atomic or- 
bitals in large-scale systems. 

The structure of the present paper is as follows. In 
Section |lTl the idea of non-orthogonal atomic orbitals 
and physical properties (e.g. the band energy, the lo- 
cal/partial density of states, numbers of occupied elec- 
tron states, the chemical potential et al.) are summa- 
rized. Sections IIIIl IIVI and |V] explain three different al- 
gorithms of large-scale linear equations, i.e. the gen- 
eralized shifted conjugate orthogonal conjugate gradi- 
ent method (GsCOCG), the generalized Lanczos method 
and the generalized Arnoldi method which generate the 
Krylov subspace from the whole Hilbert space. In these 
sections, numerical examples are presented by using the 
NRL tight-binding Hamiltonian. The generalized Lanc- 
zos method becomes applicable to actual large systems 
with a high accuracy if one use the modified Gram- 
Schmidt reorthogonalization to maintain the orthogonal- 
ity of generated basis vectors. In Section IVTl we compare 
the CPU-times of each algorithm and discuss the applica- 
bility to large-scale electronic structure calculations and 
MD simulations. Section IVlII is conclusions. The exam- 
ples without reorthogonalization in the generalized Lanc- 
zos method are shown and discussed in Appendix [X] Ap- 
pendix [B] is devoted to discuss the consistency between 
the total energy and force. 

II. THEORETICAL BACKGROUND 

A. Non-orthogonal basis set and 
S-orthogonalization 

We define two sets of wavefunctions, {0i(r)} and 
{V'q (*")}, where is the non-orthogonal (normal- 

ized) basis set (e.g. atomic orbitals and 'i' denotes an 
atomic site and energy level), and {^l)a{f)} is the or- 
thonormalized basis set. Then the overlap matrix Sij 
and the Hamiltonian matrix Hij are defined as 

Sij = {(l)i\<j)j) = J <j)i<j)]dr, Sii^l, (3) 

H,, = (cf^mcf^,) = J 4>*H^jdr , (4) 

where H is the Hamiltonian operator. The orthonormal 
basis set {ipair)} can be expanded in terms of {4>j} as 



and the orthogonality relation is expressed as 

{MM = J2w[''^*wfs,, (6) 

= (ti;("),ti;(«)s = <5„^ , (7) 

where lo^") — {w[°'\'W2°'\ )*. We call the repre- 
sentation {w^°'\ w^^^f ) s the "S-product" and the relation 
Eq. (O the "S-orthogonalization" of basis vectors lo^"^ 
When "ipair) satisfies the Schrodinger equation 

Hlpair) ^ Eai'air), (8) 

coefficients {w^"'^} should be elements of an eigen- vector 
of a simultaneous linear equation in the 0-representation; 

Y^H.M^^^e^T.S^M"^ (9) 

i i 

or, in matrix-vector form, 

iJu;(") = £„5t(;("^ (10) 

Matrices H = [Hij] and S ~ (Sij) are self-adjoint in 
(/>-representation. 



B. Green's function and local/partial density of 
states represented by the non-orthogonal basis set 

The Green's operator G is defined as 

G{z)^{{e + tT^)i-H}-\ (11) 

where 1 is the identity operator and z — e + irj. Elements 
of the Green's function matrix can, then, be written as 

G,,(z) - {cf,,\G{z)\cf,,)^{S{zS-HMS},, (12) 

= E^'4E^r-^-r}^y- as) 

k,l a 

The local (partial) densities of states is expressed in the 
(/>-representation as follows: 

A,(£) = --lm{G{z)S-%- (14) 

TT 

The normalization of the Green's functions and the lo- 
cal/partial density of states (DOS) is then 

(-^)/ c«£lniG,,(z) = (0,|0j) (15) 

/OO 
deDuie) - 1. (16) 
-OO 
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C. Total band energy and Green's function 

1. Density matrix and energy density matrix 

In the simulation process, the density matrix pij and 
the energy density matrix tt^ appear repeatedly in the 
calculation of the MuUiken charge, the total energy and 
forces f2S. whose definition may be 

-, /. (a)* (a) 

- - Im / deJ2fi^c.)- (17) 

a 

= Y.f{e^)w^r^*wf\ (18) 

a. 

(q)+ (a) 

TTij = (--)lm / fe^/(£^) ' _ ' (19) 

a 

= Y.!{e^)e^wf^*w'^^\ (20) 

where /(e) is the Fermi-Dirac function /(e) = {1 + 
exp((e — /i)/T)}~^, where \i and r are the chemical po- 
tential and temperature. 



Moreover, any physical property can be expressed by 
using the density matrix as 

(X) = J dsf{s)Y,X^MiS-'Giz)S-'),, 

- 2^X,,p,,. (29) 

The expressions Eqs. (|27l) and (|28p and also ([29]) are sat- 
isfied not only in the whole Hilbert space but also in 
the mapped subspace in which we construct approximate 
eigen-states. 

Now we have obtained three different expressions 
Eqs. (EI]) -(1231) for Ntot and Eqs.dMll'-dlHl) for Et^f 
These expressions normally give different values, because 
we usually use finite values of the energy interval, rj and 
approximate eigen-states in the mapped subspace. For- 
tunately, if the formula pijHji — SijTTji is satis- 
fied, the consistency between the total band energy and 
the force can be kept as shown in Appendix |B] 

III. GENERALIZED SHIFTED COCG METHOD 



2. Physical property 

The chemical potential fi should be determined by the 
equation for the total electron number Ntot- 

iVtot = 2^/ defie)Du{e) (21) 

i 

^ ^ (a)* (a) 

/ 2 \ f w ■ w 

= - - Im / defies) ' _ J (22) 

= '^^S^JPJ^, (23) 

where a factor "2" is the spin degeneracy. 

The total band energy of the system is given as 

occ 

Etot = 2^e„ = 2^e„/(£,) (24) 

a a 

= -^ImE / deef{e){G{z)S-%, (25) 

i 

where the summation Y^'^'^ runs over the occupied states. 
This equation can be expressed by the density of states, 
the density matrix or the energy density matrix as 

^tot = 2^ / d£ef{£)Du{e) (26) 

= 2Ep^,i^,^ (27) 

ij 

= 2j2S.,,n,.,. (28) 



We developed the shifted COCG method for large-scale 
linear equations ([1]) .14, 15,21 j|. ^^^^ shown that the conver- 
gence behavior can be monitored by observing the behav- 
ior of the "residual norm" . The shifted COCG method 
is generalized for Eq.(I2]) in this section. 



A. Definition of the problem 

The eigen-value problem of stationary Schrodinger 
equation is equivalent to the scattering problem; 



{zi-H)^{r)=x{r), 



(30) 



where z — e + irj and e is an energy parameter of incident 
waves. The wavefunction ip{r) is expanded by the set of 
non-orthogonal atomic orbitals {0j}; 



^(r) = ^(t>j{r)xj{z). 



(31) 



Substituting Eq. (PT|) into Eq. (PO)) . one obtains general- 
ized linear equations 



(zS - H)x{z) = b, 



(32) 



where the j-th component of the vector b is bj — {(t>j\x)- 
The solution of the linear equation x{z) is then 



x{z) = [zS - H)-^b = S-^G{z)S-^b 
with a help of Eq. ([T^ . By setting a vector b as 



b = ej = (0,0,' 



,0,^,0,0---)' 



(33) 



(34) 
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we can get the corresponding solution x^(z) as 

x^{z)^ S-^G{z)S-^ej. (35) 

The product of and (not S-product), the i-th el- 
ement of a vector , is identical just to the energy- 
component of the density matrix (e) ; 



1 

-— Im 

TT 



e ■ • x^ (e + ir]) 



which relates to the local DOS as 



(36) 



(37) 



Pij = I rfe/(£)py(£), 
dee/(e)p.y(e). 



Then the density matrix and the energy density matrix 
are given by the integrations of pij{e) as 



(38) 
(39) 



It should be noticed here that there is no quantities of 
eigen-energies in the Krylov subspace and, we should use 
the calculation procedure through pij (e) rather than the 
calculation of Egs. p^ and (1^ . Their resultant values 
depend on the interval of energy mesh-points for the en- 
ergy integration and a fictitious finite value of rj. 



B. Generalized shifted conjugate orthogonal 
conjugate gradient (GsCOCG) method 

For non-orthogonal basis set, we can generalize the 
shifted COCG procedure, named the generalized shifted 
COCG (GsCOCG) method^^ The linear equations of 
the 'seed' energy and the 'shift' energy cr, respectively, 
are written down as 



{S-^A + asl)x = S-'^b, 
(5^M + al)a;« = S'^b, 

where the matrix A is defined as 

A = z,,fS - H 



(40) 
(41) 

(42) 



with an arbitrary reference energy Zj-of = ^rcf + i*?, 1 is 
the unit matrix and b = Bj. The seed energy and the 
'shift energy' are given as = £rcf + o's and e — Crof + cr. 

Following the procedure of the shifted COCG 
method, we try to find iterative n-th solutions a;„ 
in the Krylov subspace defined 



Kn{S-^A^aJ,S-^b) 
= Span{S'-i6, S-^AS'^b, {S'^AfS-^b, 
••• ,{S-^ATS-^h]. 

s-'b 



This yields the residual vector r'^ 
asl)xn to be^^ 

r'^±Kn{{S-^A + asl)lb* 



(43) 
{S-^A + 

(44) 



where is the Hermitian conjugate matrix of B and b* 
is the complex conjugate vector of b. The actual algo- 
rithms may be as follows. Under the initial conditions 



xo = P-i = 0, 
^•o = b, 

a-i = 1, 13-1 



0, 



(45) 
(46) 
(47) 



and a definition = S^^ro, we evaluate the follow- 
ing equations for the 'seed' energy as iteratively for 

n : 



0,1, 2,.-.: 




Pn 


= r',^ + Pn-lPn-l, 




{Pn,S-^{A + (JsS)pn)s 








= r„ - a.n{A + (TsS)Pn, 




— S ^Vn+l, 







(48) 



Important point here is our use of — S'^^r„. In actual 
procedure, we employ a form r„ = Sr'^ at each iteration 
step by CG method. Since the overlap matrix S is real 
symmetric positive definite and sparse, the convergence 
of CG iteration can be fast. 

The basic theorem of the Krylov subspace is the in- 
variance of the subspace under an energy shift a. The 
other very basic theorem is the collinear residua?— 
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V = 



(49) 



Owing to these theorems, once we solve the set of equa- 
tions for the 'seed' energy as, we can obtain the results 
for any shift energy a only by scalar multiplications. The 
recurrence equations for shift energies are given (all the 
quantities are denoted by the superscript cr), with initial 
values tt'^]^ = ttq = 1, as follows; 

<+i = |l + "«('^-o-s)|< + ^^"««"<-i) 
I- J an-i 

(50) 



and 



with 



(51) 



-an, 



'n+1 



Pn 



/3n- 



~7^n + Pn-lPn-l- 
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Partial densities of states are shown in Fig. [T] for a 
system of Au 864 atoms by NRL tight-binding Hamiho- 
nian,— in comparison with those by the exact calcula- 
tions. In order to see the behavior of the peak positions 
and the tail of the peaks, the figures are drawn in the 
logarithmic scale. Two lines of GsCOCG and the ex- 
act calculation overlap each other almost completely and 
one can recognize an excellent agreement between the 
two different calculations. 




Energy ( 8 ) Ry 

FIG. 1: (Color on line) Partial density of states for a system of 
Au 864 atoms by NRL tight-binding Hamiltonian,— normal- 
ized to unity, (a) s-orbitals, (b) p-orbitals and (c) d-orbitals. 
Comparison is for GsCOCC and the exact calculation, which 
are almost identical to each other. Parameters in GsCOCG 
calculations are rj = lO^'^Ry, r — br). The energy interval of 
mesh-points is 10~*Ry. See Figs. |3J[S]and[7]for comparison. 



C. Residual norm and convergence behavior 




'^°0 50 100 150 200 250 300 350 400 450 

iteration step 



FIG. 2: (Color on line) Convergence behavior of residual 
norms for a system of Au 256 atoms by NRL tight-binding 
Hamiltonianz-tt Spectrum extends between —0.5 Ry to 1.5 Ry. 
The inset in (b) show the total density of states D{e) = 
Dii{e), where we use a finite imaginary number in the en- 
ergy and the profile is of dense spiky peaks, (a) Residual norm 
I ri'''^' II at several energy points e = —0.5, 0.0, 0.5, 1.0 Ry 
for s-orbital. (b) Average residual norms Rn ^ with different 
three seed energies (—0.5, 0.0, 0.5Ry) for s-orbital and they 
all overlap with each other. 



The useful characteristic property of GsCOCG method 
is the capability of monitoring the norm of residual vec- 
tors.— The residual vectors for the seed and shift equa- 
tions with an energy Sk (with b = ej and dk — Sk~ £ici) 
are Vn'''"' and r^'''\ respectively, and the 'mapped' resid- 
ual vectors for the seed and shift equations r'n^''^'^ = 
S~^rn'^^ and r'n'^'^^ = S~^r'^'^\ We usually need only 
elements of the density matrix among near-sited orbital 
pairs connected by non-zero elements of the Hamiltonian 
or overlap matrices and the convergence monitoring is 
necessary for these components'^ Therefore, in order to 
monitor the convergence behavior, we adopt the "resid- 
ual norm" defined as 

||r;(^/'=-)f^'|:V*-r-;(^/'='^'f. (52) 

i 

Furthermore, since the residual norm is different among 
different energy points, the average quantity ("average 
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residual norm" ) should be defined as^^ 

where iVenc is the number of energy points. 

The convergence behavior of the residual norm for dif- 
ferent seed energies is shown in Fig. [5^. The convergence 
at the energy of the low DOS is very fast, because the 
eigen-state can be constructed by a small number of basis 
states. The convergence of the averaged norm is shown in 
Fig. which confirms numerically the fact that the av- 
erage residual norm (and all the physical quantities) does 
not depend sensitively on the choice of a seed energy. 

D. Seed-switching technique 

When one chooses a seed energy in an energy range of 
rapid convergence, the spectra at majority energy points 
have not been converged yet and one should restart the 
calculation with a new seed energy as seen in FigI5^. The 
most desirable seed energy may be the one of the largest 
(partial) DOS because the convergence at these point is 
the most slowest. 

However, even if one chooses a starting seed energy in 
the highest DOS region and the residual norm at the seed 
energy reaches the convergence criterion, it often hap- 
pens that there still remain several energy points/regions 
where the residual norm has not been small enough. For- 
tunately the shifting energy does not need any additional 
heavy computational task but several scalar manipula- 
tion as Eg. ([501) . Because of this property of shifting en- 
ergy, a choice of a seed energy can be arbitrary. As 
shown in FiglllD, even if we start with an improper seed 
energy and switch a seed, the total iteration times for 
desired convergence over whole energy range is not very 
different. The seed- switching is very efficient technique to 
avoid restarting the calculation from the beginning with 
a new seed energy ^^i^^ One chooses a new seed energy 
cr™^ and can continue the calculation without discarding 
the information of the previous calculation with the old 
(Ts by using the shift property. Figure [3] shows the behav- 
ior of the residual norms in the seed-switching process. 

IV. GENERALIZED LANCZOS PROCESS AND 
DENSITY OF STATES 

The three-term recursive relation used in GsCOCG 
method leads us to the generalization of the Lanczos 
method i^^'^^'^^ As far as we know, the generalization of 
the Lanczos method was presented first in Ref. [ll. In 
this section, we will stress that the generalized Gram- 
Schmidt reorthogonalization process makes G-Lanczos 




100 200 300 400 500 600 700 800 900 1000 

iteration step 




500 1000 1500 2000 2500 3000 3500 



iteration step 

FIG. 3: (Color on line) Residual norm at the seed energies for 
a system of Au 864 atoms by the NRL tight-binding Hamilto- 
nian,^* in s-orbitals (a) and d-orbitals (b). (a) Seed-switches 
to 0.0 Ry from -0.5 Ry at 36-th step and to -0.299 Ry at 376- 
th step in the s-orbital case, (b) Seed-switches to 0.110 Ry 
from -0.5 Ry at 28-th step and to -0.034 Ry at 1002-nd step 
in the d-orbital case. Once the calculation using one seed 
is converged and full convergence has not been achieved, one 
should choose the next seed and continue the calculation. The 
gray lines show the residual norm by energy shift r'^ before 
seed-switching. 



method practically useful and also the use of Eas. p^ 
and (|20p gives very efficient and accurate results. 



A. Generalized Lanczos process 

First we define a matrix H as 

n = S-^H, (54) 

which is not self-adjoint but still satisfies the quasi- 
Hermitian property in the S-product; 

{v,nu)s = {Hv,u)s. (55) 
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We can construct the three-term procedure of the Lanc- 
zos process {n = 0, 1 • • • ) asi^^ 

where 



n+l 



(56) 



u 



{{H - a„>" 



n+l 



with conditions 5o ~ 0, 6,i > and then the vectors {m™} 
satisfy the S-orthogonahty 



(57) 



This process we call the generalized Lanczos (G-Lanczos) 
process (method). It is well-known that the orthogonal- 
ity relation is broken for larger n in the Lanczos method 
and this is also the case here. We adopt the modi- 
fied Gram- Schmidt reorthogonalization process in order 
to keep the S'-orthogonality. (See the results without 
the modified Gram-Schmidt reorthogonalization process 
in Appendix lAl) 

We then stop the Lanczos process up to n = iV and 
assume = (m = -I- 1, -|- 2, ■ • • ). This procedure 
constructs the Krylov subspace 



KN(n,b) = {b,'Hb,n'b,--- ,W^b) 



(58) 



and the matrix T-i is transformed in this subspace to a 
matrix of a tridiagonal form. 

Starting with a natural basis vP = ej^, one generates 
vectors it" in the Krylov subspace and each vector corre- 
sponds to orthonormalized linear combination of atomic 
orbitals (LCAO); 



^"(r) = ^0,(rX 



(59) 



The normalized eigen-states in the generated Krylov sub- 
space is denoted by 



N 



n=0 j 

which satisfies the Schrodinger equation 



(61) 



where H„„ = (u",-Hm'")s = (M")*iJu™. 



B. Numerical test with NRL Hamiltonian for fee 

Au 

Chemical potential fi can be evaluated by using 
Eqs. (PTjl '^ ipH)) in the generalized Lanczos method. Cal- 
culation of the Green's function uses Ea. (IT5)) having a 



double summation of atomic sites and orbitals and it 
consumes a long CPU time. On the contrary, the cal- 
culation of the density of states by Eq. (ITU) costs less 
CPU time. The computational efficiency will be dis- 
cussed later in Section IVII Here in this subsection, we 
show several evaluated values, the density of states, the 
integrated density of states as functions of energies for a 
system of gold 864 atoms of fee structure described by 
the tight-binding Hamiltonian constructed by Mehl and 
Papaconstantpolous.^- 

Several evaluated values and consistency between them 
are summarized in Table ID The parameters in the gener- 
alized Lanczos method are A^ = 50, the convergence crite- 
rion S — lO^^Ry in the inner CG process of r' = S~^r. 
In GsCOCG, the imaginary small energy rj = 10~'^Ry, 
the total number of the energy integration mesh-points 
is 3,000, and the convergence criterion S = 10~®Ry in 
the inner CG and outer iteration procedures. The dif- 
ference of the calculated total energy is of the order of 
10^^ ^ lO^'^Ry. The scale of the band energy is 1 Ry 
and the relative error may be of 10"'^. The bold num- 
bers in Table U are a set of consistent values in each case 





1.5 2 .1 .0.5 

Energy ( s ) Ry 

FIG. 4: (Color on line) Partial density of states (pDOS), nor- 
malized to unity, and integrated density of states (IDOS) for 
a system of Au 864 atoms by NRL tight-binding Hamilto- 
nian.'^* Comparison is between those of the generalized Lanc- 
zos method (red solid lines) and those of the exact ones (blue 
solid lines), (a-1) and (a-2) :pDOS and IDOS for s-orbitals. 
(b-1) and (b-2) :pDOS and IDOS for p-orbitals. (c-1) and 
(c-2) :pDOS and IDOS for d-orbitals. 77 = 5.0 x 10"^Ry 
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TABLE I: The generalized Lanczos process applied to a system of 864 atoms of fee Au and comparison with that by GsCOCG. 
The Hamiltonian is the NRL tight-binding form.'^'* The chemical potential and the total energy are in Ry unit and A^atom is 
the number of atoms in the system. 





G-Lanczos 


GsCOCG 






/i by Eq.(121J) /i by Eq.(1221l by Eq.(|23p 




p by Eq.([2T)) 






0.3010 8669 0.2926 6528 0.2870 0985 




0.3006 0832 


Eq.iai 

Atot/2Aatom Eq.(122J 
Eq.llaj 




5.5000 0001 5.4809 9705 5.4674 0401 
5.5180 6152 5.4999 9998 5.4865 0739 
5.5316 2252 5.5135 2184 5.4999 9999 


Eq.dm 


5.5000 0000 


Eq.(I2SI 

£:tot/2Aatcm Eq.(l27j 

Eq.JHJ 


p by Eq.fTTl) 
p by Ea.(fT5|) 
TT by Ea.((T9)l 
TT by Eg. ((20)1 


-0.1436 6686 -0.1492 6754 -0.1531 8294 
-0.1311 6499 -0.1364 7104 -0.1403 4132 
-0.1316 2401 -0.1369 4155 -0.1408 2023 
-0.1311 6490 -0.1364 7094 -0.1403 4123 
-0.1316 2391 -0.1369 4146 -0.1408 2014 


Eq.(|2SJ 
p by Eq.(|38j 

TT by Eg. ((39)) 


-0.1436 4084 

-0.1251 7255 

-0.1436 4084 



(for each equation determining the chemical potential) 
and the combination of Eqs. ((TT)) and ((T9)) . and that of 
Eqs. (HHj) and ((^0)) are consistent pairs of data. This con- 
sistency between the density matrix and energy density 
matrix is crucial for consistency between the total energy 
and force f Appendix (B)) . 

In GsCOCG method, calculation of Eqs. dSHj) and ((39)) 
needs e-integration of Pij{s) and the numerical integra- 
tion causes a certain error in the integration of the tail of 
the spectra. Once we reduce the value of rj and increase 
the e-points of the integration, this discrepancy can be 
reduced. In the generalized Lanczos method, the com- 
bination of Eqs. ([23]), (HD) and ([20]) is the best scheme, 
since we do not use the numerical energy integral. 

Figure ID shows the partial density of states and inte- 
grated density of states for s-, p-, and d-orbitals. Those 
by the generalized Lanczos method are compared with 
exact results. The peak positions and the tails over- 
lap excellently and we can conclude that the generalized 
Lanczos method can be powerful and convenient tool. 
One must notice that the exact results are more smooth 
in the central energy region, since the correct density of 
states is more dense in this energy region. In other words, 
the exact results reserve entire profile with 864 x 9 peaks 
but, on the contrary, those by the generalized Lanczos 
method reserve the profiles with 51(= N + 1) peaks or 
the density of states is expressed as a polynomial function 
of energy of the order 51 in the present calculation. 



H, asi^^ 

r+i = Hu" (62) 

n 

fc"+i = - ^ M'"(M™,r)s (63) 

l.n + 1 

= -. (64) 

This is the Arnoldi process and we call it the generalized 
Arnoldi (G-Arnoldi) method. The generalized Arnoldi 
method generates the Krylov subspace 

KN+iiH;b) = Speiii{b,Hb,H^b,--- ,H'^b} 

= Span{M°,M\M2,---M^}. (65) 

The generated vector corresponds to orthonormalized 
LCAO 

^™(r) = ^</.,(r)«7 (66) 

j 

as in G-Lanczos method. An eigen- function tp°'{r) = 
J2n '/'"(''")Qn"'' satisfies the Schrodinger equation 

J^HnmQl:^ ^S^Qi"^ (67) 
m 

where Hmn = 1^1'/'") = {u"^)*'Hu^ and H is an up- 
per Hessenberg matrix. We can say that this procedure 
is a kind of generalization of the subspace diagonalization 
of the Krylov subspace developed beforeJ^ 



V. GENERALIZED ARNOLDI PROCESS AND 
DENSITY OF STATES 

A. Generalized Arnoldi process 

We can construct the Krylov subspace, starting with a 
natural basis m° = ejg , by using the Hamiltonian matrix 



B. Numerical test with NRL Hamiltonian for fee 

Au 

We summarize, in Table [ill several evaluated values 
and consistency between them, in comparison with the 
results of GsCOCG. The system is of 864 atoms of fee Au 
by NRL tight-binding Hamiltonian.^"* The bold numbers 
in the table are a set of consistent results in each calcula- 
tion of the chemical potential. A'' = 50 for the generalized 
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TABLE II: The generalized Arnoldi process applied to a system of 864 atoms of fee Au and comparison with that by GsCOCG. 
The Hamiltonian is the NRL tight-binding form.'^* The chemical potential and the total energy are in Ry unit and A^atom is 
the number of atoms in the system. 





G- Arnoldi 


GsCOCG 






/i by Eq.(121J) /i by Eq.(1221l by Eq.(|23p 




p by Eq.([2T)) 






0.3039 0679 0.2928 7198 0.2864 6136 




0.3006 0832 


Eq.(l2TI 

A^tot/2Aratom Eq.(122J 
Eq.llaj 




5.5000 0002 5.4797 1790 5.4658 1773 
5.5176 0887 5.5000 0001 5.4865 0981 
5.5311 6653 5.5135 1949 5.4999 9999 


Eq.dm 


5.5000 0000 


Eq.(I2SI 

Etot/2N^tom Eq.(l27j 

Eq.JHJ 


p by Eq.fTTl) 
p by Ea.(fT5|) 
TT by Ea.(fT9)l 
TT by Eg. ((20)1 


-0.1423 0073 -0.1482 6563 -0.1522 3179 
-0.1300 1076 -0.1351 4434 -0.1389 5498 
-0.1304 6635 -0.1356 1105 -0.1394 2996 
-0.1300 1076 -0.1351 4434 -0.1389 5498 
-0.1304 6635 -0.1356 1105 -0.1394 2996 


Eq.dsg 

p by Eq.dMJ 
TT by Eg. ((39)) 


-0.1436 4084 

-0.1251 7255 

-0.1436 4084 




Energy ( e ) Ry 

FIG. 5: (Color on line) Partial density of states, normalized 
to unity, and integrated density of states. Comparison is be- 
tween those of the generalized Arnoldi method (red solid lines) 
and those of the exact ones (blue solid lines), for a system of 
Au 864 atoms by NRL tight-binding Hamiltonian.^* (a-1) and 
(a-2) :pDOS and IDOS for s-orbitals. (b-1) and (b-2) :pDOS 
and IDOS for p-orbitals. (c-1) and (c-2) :pDOS and IDOS for 
d-orbitals. 77 = 5.0 x 10~^Ry. 



Arnoldi process. Data of GsCOCG are the same as in Ta- 
ble m The calculated values of the total energies agree 
with those by the generalized Lanczos method and the 
overall difference is less than 1% as shown in Tables ID 

Figure |S] shows the partial density of states and the 
integrated density of states as functions of energies. The 
peak positions are deviated slightly from those by the 



exact calculation, which one could make smaller with in- 
creasing a dimension N of the Krylov subspace. 



VI. COMPARISON AMONG GSCOCG, 
G-LANCZOS METHOD AND G-ARNOLDI 
METHOD 

A. Convergence 

The dimension of the Krylov subspace in GsCOCG, 
G-Lanczos or G- Arnoldi methods equals to iV -f 1 where 
N is the maximum iteration step. GsCOCG method is 
very accurate method if one achieves the iteration to have 
enough small residual norm (e.g. 5 — 10~^Ry). In Figl^J 
we have shown the convergence behavior of the residual 
norms with different seed energies and until much smaller 
convergence region. One should use the same iteration 
criteria both in the inner CG and outer procedures in 
GsCOCG method. It sometimes happens that the resul- 
tant DOS shows an un-physical behavior, e.g. negative 
values of DOS, if one stops the iteration steps before 
enough convergence in GsCOCG method. On the other 
hand, G-Lanczos and G- Arnoldi methods never give such 
un-physical DOS even if one stops at small iteration step 
because of the expression of Eq. ((T5)) . Furthermore, the 
first N moments are preserved correctly in the energy 
spectra of G-Lanczos method. In the spectrum of the 
present model by the exact calculation, we observe about 
forty prominent peaks and then we use A'^ = 50 in the cal- 
culations of G-Lanczos and G- Arnoldi procedures. This 
is presumably the reason why the peak positions and de- 
tailed profiles in the spectra of G-Lanczos method show 
excellent agreement with those of the exact calculation. 
G-Lanczos method needs the Gram-Schmidt reorthogo- 
nalization and also it is necessary to have enough con- 
vergence in the inner CG procedure. G- Arnoldi method 
does not need such reorthogonalization procedure since 
one solve the eigen- value problem in that subspace. 

In condensed matters, the width of the valence and/or 
conduction bands W may be of the order of IRy. Then, 
when the number of atoms is iVatom, we can estimate 
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TABLE III: Chemical potential and the total energy (in Ry 
unit) of systems of 864 and 256 atoms of fee Au'^* by three 
different method. The values of p in the G-Lanczos and G- 
Arnoldi methods are evaluated by Eg. (1181) . 



G-Lanczos 


(864 atoms) (256 atoms) 


p Eq.(|22|) 

Stot/2iV,tom Eq.dlTl 


0.2926 6528 0.2875 4133 
-0.1364 7104 -0.1358 7639 


G-Arnoldi 


(864 atoms) (256 atoms) 


p Ea.(|22ll 

£tot/2Af,tom Eq.dlTl) 


0.2928 7198 0.2886 2149 
-0.1351 4434 -0.1346 6195 


GsCOCG 


(864 atoms) (256 atoms) 


M Eq.lST]) 

Stot/2iVatom Eq.dlSI 


0.3006 0832 0.2947 1535 
-0.1436 4084 -0.1433 1925 



the separation of each energy level as of the order of 
W/ (9 X A^atom)- Presumably 10% of this separation would 
be an enough accuracy in the energy scale. In our present 
case, with A^atom ~ 1,000, the convergence criterion can 
be chosen as 0.1 x W/{9 x A^atom) ~ 0.1 x 1/9,000 « 
10~^Ry. We also observed that the maximum iteration 
steps are almost the same for the convergence criterion 
6 = IQ-^ and 5 = 10"^ in GsCOCG and G-Lanczos 
methods. This is the reason why we choose 5 — lO^^Ry. 

We compare the results of the chemical potential and 
the total band energy of systems of 256 and 864 atoms 
in Table IIIII The difference of the chemical poten- 
tial /i is of the order of 10~^Ry, and that of the to- 
tal band energy i?tot is of the order of 10~*Ry. The 
level separation in 256 atom system can be estimated as 
1.0/(9 X 256) sa 4 X lO'^Ry and that in 864 atom system 
lO^'^Ry. The difference in the chemical potential of two 
systems of different sizes is due to the difference of total 
number of levels which changes the value of the chemical 
potential sensitively. On the other hand, the difference of 
ii^tot is just the quantity related to the overall spectrum 
and we can see an excellent convergence of the results of 
A^ = 50. 

Figure [6] shows the actual convergence behavior of 
IDOS by G-Lanczos and G-Arnoldi method. The agree- 
ment between the results by G-Lanczos or G-Arnoldi 
methods and those by the exact calculation is excellent 
both for systems of 864 atoms and that of 256 if we 
adopt A^ — 50. The most apparent difference appears 
in the IDOS curves of exact calculation of systems of 256 
atoms and 865 atoms in the mid energy region, and the 
calculated results by our present methods present this 
difference with complete fidelity. 



(a) 

Exact fi64 atoms 


y Lanczos 864 atoms 

Lanczos 256 atoms 









(b) 

Exact 864 atoms 

Exact 256 atoms 


Amoldi 864 atoms 

Jf Arrioldi 2.56 ;ilom^ 















-?l.5 0.5 1 



Energy (Ry) 

FIG. 6: (Color on line) Comparison of IDOS of d-orbitals 
for systems of An 256 and 864 atoms by NRL tight-binding 
Hamiltonian.^^ (a) G-Lanczos (solid line) and exact calcula- 
tion (chain line) for 864 atoms (red) and 256 (blue) atoms, 
(b) G-Arnoldi (solid line) and exact calculation (chain line) 
for 864 atoms (red) and 256 (blue) atoms. The agreement 
between the results by G-Lanczos or G-Arnoldi methods and 
those by the exact calculation is excellent. 



method for the NRL Hamiltonian of fee Au system of 256 
and 864 atoms. The total number of orbitals equals to 
nine times of the total number of atoms (1 s, 3 p's and 5 
d's). We use, in the inner CG process of the generalized 
Lanczos (G-Lanczos) method, the convergence criterion 
5 = lO^^Ry. Two numbers in the row of the CPU time 
are referred to those of A^ = 50 and A^ = 100, respec- 
tively, for G-Lanczos and G-Arnoldi methods, though the 
results of A^ = 100 almost coincide with those of N = 50. 
For GsCOCG (with shifted 3,000 energy points), the data 
shown here are those of (5 = 10^*^Ry both for in the in- 
ner and outer iteration processes. The repeated time of 
the inner CG process {S'^x part) in GsCOCG and G- 
Lanczos method is 10 ^ 11. (Repeated time of 25 ~ 27 
is needed for S — 10~^^Ry.) 

The system size dependence of the CPU time is lin- 
ear for the generalized Arnoldi (G-Arnoldi) method and 
the generalized Lanczos (G-Lanczos) method, bilinear for 
GsCOCG method and cubic for the exact calculation. 
The generalized Arnoldi method is extremely efficient in 
electronic structure calculations of extra-large systems 
with several hundred thousands atoms. 



B. CPU times 

We summarize, in Tables IIVI the CPU times (by 
using single CPU of the standard workstation) for (s- 
orbitals) Da of the generalized Lanczos and the gener- 
alized Arnoldi methods with Eq.dH]), that of GsCOCG 
with Eg. ([57)1 ■ and that of the exact diagonalization 



C. Applicability to large-scale electronic structure 
calculations and MD simulations 

In the exact calculation and GsCOCG method, calcu- 
lations of physical properties, such as the density ma- 
trix, the energy density matrix, and chemical potential. 
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TABLE IV: CPU times by using a standard single CPU work- 
station, for (a) a system of gold 256 atoms (Au256) and (b) 
gold 864 atoms (Au864) by NRL tight-binding Hamiltonian.— 



Id) IWA ZuU 




CPU-times (s) 


G-Arnoldi 


main part 


0.52 


1.11 






total 


1.59 


3.20 






1 n n OT I 
iliiici VyVjr 


2.04 


3.89 






main part 


2.89 


5.60 






total 


3.92 


7.62 












13.67 




shifted 






7.53 




total 






21.20 


7 








57.60 


[O) AU 804 




CPU-times (s) 


G-Arnoldi 


main part 


1.94 


3.76 






total 


3.00 


6.00 




G-Lanczos 


inner CC 


7.93 


15.13 






main part 


10.87 


21.13 






total 


11.93 


23.31 




GsCOCG 


seed 






140.89 




shifted 






108.36 




total 






249.15 


Exact 








2111.51 



require the numerical integration as in Eqs. (|2ip . p6|. 
([55)1 and (pn)) . Therefore, in order to keep high accu- 
racy, the integration needs fine energy mesh points. On 
the other hand, the generalized Lanczos or the general- 
ized Arnold! methods use the simple summation of the 
eigen-states in the mapped subspace in Eqs. (|22|) . (f23l) . 
and (fT7|) ^ (pO)) . These two methods do not consume the 
CPU time and give stable values of the density matrix 
and the energy density matrix. 

The CPU times per one MD-step are, for the present 
models, a few seconds by the generalized Lanczos method 
and the generalized Arnoldi method. From the above 
comparison among various viewpoints, we can conclude 
that the generalized Lanczos method or the generalized 
Arnoldi method are very suitable to large-scale electronic 
structure calculations and MD simulation of several tens 
of thousands atoms and a long MD-steps. On the other 
hand, GsCOCG method can give excellently rigorous re- 
sults with more CPU times and may be applicable to 
problems of a fixed atomic configuration (but not for the 
MD simulation). 

GsCOCG method is based on the three-term recur- 
sive equations and we need store three generated vec- 
tors at each recursive process. Of course, when the size 
of the Hamiltonian and overlap matrices are extremely 
large and the memory size becomes a serious obstacle, 
(though much smaller consumption than the exact diag- 
onalization method, ) we should invent other method of 
much faster convergence and smaller cost of memory size. 

The convergence criterion 5 = 10~^ might corresponds 
to the range of neighboring 1,000 atoms, as already dis- 



cussed, and we do not observe any clear difference be- 
tween results by the present methods and the exact cal- 
culations. Even when we should discuss some physics 
of nano-scale systems, the electronic structure is deter- 
mined by some nearby surroundings. This idea we call 
near-sitedness^ Even when we have to deal with a much 
larger systems, we can use smaller interaction range than 
the system size due to the near-sitedness. Presumably 
more serious problem of the system size in some specific 
problems, for examples, entire calculation of nano-devise 
or the electron-strain field interaction such as fracture 
propagationii^ and dislocationi^ 



VII. CONCLUSION 

We have derived several efficient and accurate algebraic 
methods to calculate the Green's functions, total/partial 
density of states and total band energy in case of non- 
orthogonal atomic orbitals. The method is very general. 

We have investigated the accuracy and efficiency by 
showing numerical data with different numerical proce- 
dures. GsCOCG is very accurate with less consumption 
than the exact diagonalization but may not be appropri- 
ate for long MD-step simulations. The generalized Lanc- 
zos method becomes applicable to actual large systems 
with the modified Gram-Schmidt reorthogonalization to 
maintain the orthogonality of generated basis vectors. 
Then, the generalized Arnoldi method and the general- 
ized Lanczos method are accurate and efficient, and their 
CPU times depend linearly upon the system size. There- 
fore, these two methods would be the most suitable to 
the large-scale electronic structure calculations and MD 
simulations. A crucial point we should point out finally is 
the fact that G-Lanczos and G-Arnoldi methods do not 
adopt any numerical integration in energy which leads 
additional numerical error. 
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Appendix A: Re-orthogonalization by modified 
Gram-Schmidt method 



The three-term recursive relation in the generalized 
Lanczos method guarantees theoretically the automatic 
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(a-2) 




(b-2) 








(c-2) 



2 -1 4.5 1 0.5 1 1.5 2 

Energy ( £ ) Ry 

FIG. 7: (Color on line) Partial density of states and integrated 
ones without the modified Gram-Schmidt reorthogonalization 
of the generalized Lanczos method (red solid lines) and those 
by the exact ones (blue solid lines) for a system of Au 256 
atoms by NRL tight-binding Hamiltonian.— A'^ = 50. (a-1) 
and (a-2) :pDOS and IDOS for s-orbitals. (b-1) and (b-2) 
:pDOS and IDOS for p-orbitals. (c-1) and (c-2) :pDOS and 
IDOS for d-orbitals. One should notice that the normalization 
of the integrated density of states is broken without reorthog- 
onalization procedure and that some "ghost" peaks appear 
due to incorrect mixing of states. r\ = 5.0 x 10~^Ry. 



S-orthogonalization. However, the orthogonality is bro- 
ken in the numerical calculation procedure. This prob- 
lem causes several troubles such as the existence of con- 
stant background of error in the spectrumfi^ appearance 
of "ghost" structure in spectrum due to erroneous mix- 



ing of states and a broken normalization of the partial 
density of states. 

Figure [7] shows the examples of this broken orthonor- 
mality, in a system of 256 atoms of fee Au by using the 
NRL Hamiltonian. One can see the "ghost" peaks (e.g. 
at e ~ 1.8Ry in (a-1), at e ~ 1.25 1.3Ry in (b-1) and 
(c-1)) and broken normalization (e.g. in (a-2) and (b-2)). 
These problems are solved by the re-orthogonalization 
with the modified Gram-Schmidt method. 
Appendix B: Consistency between the total energy 
minimum and a vanishing force 

We should construct our eigen-states in a small sub- 
space and a certain numerical error is unavoidable in eval- 
uated total energy and force. Even in that case, the con- 
sistency between the total band energy minimization and 
an vanishing atomic force is the most important in the 
electronic structure calculation in equilibrium atom con- 
figuration. In the framework of the tight-binding model, 
the force (due to band energy) acting on an atom / is 
evaluated by a formula 



P^j 



dRi 



'dRi 



(Bl) 



which can be rewritten, with only an assumption of the 
eigen-state property Eq.([5]) in the mapped subspace, as 

ij ij 

d 



dRi 



(B2) 



Therefore, calculated atomic and electronic configura- 
tion of the minimum total energy is consistent with that 
of vanishing atomic force, if the identity '^ijiPijHij) = 
'^iji'^ijSij) is satisfied always in any atomic configura- 
tion. It should be noticed that the above equality is sat- 
isfied in the mapped subspace as described in lllC 21 It is 
important in actual calculating procedure that we should 
use the consistent pair of equations as Eqs. (fT7| and (fT9)) 
or Eqs. pS]) and 
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